9/13/2021

ISCA 2021

Advances in Spatial and Spatio-temporal Modeling and its Applications

Overview

  • Spatio-temporal reconstruction of climate from pollen.

    1. Non-linear functional relationship.

      • Computationally challenging.

      • Non-linear and non-Gaussian relationships among data.

    2. Using data/parameter augmentation to improve computation.

Pollen is a unique paleoclimate proxy

Modeling goal

  • Link the observed pollen counts to climate states during the modern period.

  • Use the learned relationship to predict unobserved climate state.

  • Generate climate histories that are local to the site of interest with uncertainty.

Tipton JR, Hooten MB, Nolan C, Booth RK, McLachlan J (2019). “Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction.” The Annals of Applied Statistics, 13(4), 2363-2388.

Setting the stage

  • Spatially explicit reconstructions of climate variables are important.


    • Many important ecological questions are local.

    • Predictions at all locations remove the effects of sampling bias in paleoclimate reconstruction.

Setting the stage

  • Prior work – 4 pollen cores and total compute time of approximately 28 hours.


  • Currently: 363 sites:

    • non-linear response: compute time \(\approx\) 48 hours.
    • Pólya-gamma linear response: compute time less than 2 hours.

Holmström L, Ilvonen L, Seppä H, Veski S (2015). “A Bayesian spatiotemporal model for reconstructing climate from multiple pollen records.” The Annals of Applied Statistics, 9(3), 1194-1225.

Non-linear response model

Data model

  • Sediment samples from a lake.

  • Take 1cm\(^3\) cubes along the length of the sediment core.

  • In each cube, researcher counts the first \(M\) pollen grains and identifies to species/OTU.

  • Raw data are counts of each species.

  • Compositional count data.

Data Model

For the \(i\)th observation at location \(\mathbf{s}\) and time \(t\),


\[ \begin{align*} \mathbf{y}_i \left( \mathbf{s}, t \right) & = \left( y_{i1} \left( \mathbf{s}, t \right), \ldots, y_{ij} \left( \mathbf{s}, t \right) \right)' \end{align*} \]


is a \(J\)-dimensional compositional count observation.


  • \(y_{ij} \left( \mathbf{s}, t \right)\) is the count of species \(j\) in the \(i\)th sample at location \(\mathbf{s}\) and time \(t\).

Data Model

\[ \begin{align*} \mathbf{y}_i \left( \mathbf{s}, t \right) | \mathbf{p}\left( \mathbf{s}, t \right) & \sim \operatorname{Multinomial} \left( M_i \left( \mathbf{s}, t \right), \mathbf{p}\left( \mathbf{s}, t \right) \right). \end{align*} \]


  • \(M_i \left( \mathbf{s}, t \right) = \sum_{j=1}^J y_{ij}\left( \mathbf{s}, t \right)\) is the total count observed (fixed and known) for observation \(i\) at location \(\mathbf{s}\) and time \(t\).

  • Observation informative of the relative proportions \(p_{j} \left( \mathbf{s}, t \right)\) only.

Data Model: Overdispersion

  • The pollen data are highly variable and overdispersed.


  • Mixture over a Dirichlet distribution.


\[ \begin{align*} \mathbf{p}\left( \mathbf{s}, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}, t \right) & \sim \operatorname{Dirichlet} \left( \boldsymbol{\alpha}\left( \mathbf{s}, t \right) \right). \end{align*} \]


  • Marginalize out \(\mathbf{p} \left( \mathbf{s}, t \right)\).

\[ \begin{align*} \mathbf{y}_i\left( \mathbf{s}, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}, t \right) & \sim \operatorname{Dirichlet-Multinomial} \left( M_i\left( \mathbf{s}, t \right), \boldsymbol{\alpha}\left( \mathbf{s}, t \right) \right). \end{align*} \]


Data Model: Overdispersion

  • Model the Dirichlet-multinomial random effect using the log link function:

\[ \begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}, t \right) \right) & = f \left( \mathbf{z}\left( \mathbf{s}, t \right) \right) \boldsymbol{\beta}. \end{align*} \]


  • \(\mathbf{z}\left( \mathbf{s}, t \right)\)’ is a \(q\)-dimensional vector of climate variables.

  • \(f(\cdot)\) is some function of the climate state.

  • \(\boldsymbol{\beta}\) is a \(q \times J\) dimensional matrix of regression coefficients.


Response Function

\[ \begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}, t \right) \right) & = f \left( \mathbf{z}\left( \mathbf{s}, t \right) \right) \boldsymbol{\beta}. \end{align*} \]


  • \(f \left( \mathbf{z}\left( \mathbf{s}, t \right) \right)\) is a basis expansion of the covariates \(\mathbf{z}\left( \mathbf{s}, t \right)\).
    • Use B-splines or Gaussian Processes as a basis.
    • \(\mathbf{z}\left( \mathbf{s}, t \right)\) is unknown for \(t \neq 1\).
    • Computationally challenging.

Tipton JR, Hooten MB, Nolan C, Booth RK, McLachlan J (2019). “Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction.” The Annals of Applied Statistics, 13(4), 2363-2388.

Hefley TJ, Broms KM, Brost BM, Buderman FE, Kay SL, Scharf HR, Tipton JR, Williams PJ, Hooten MB (2017). “The basis function approach for modeling autocorrelation in ecological data.” Ecology, 98(3), 632-646.

Calibration

  • \(\mathbf{z} \left( \mathbf{s}, t \right)\)s are observed only at \(t\) = 1.


  • Calibration: Estimate \(\boldsymbol{\beta}\) using: \[ \begin{align*} \mathbf{y} \left(1\right) & = \left( \mathbf{y} \left( \mathbf{s}_1, 1 \right), \ldots, \mathbf{y} \left( \mathbf{s}_n, 1 \right) \right)' \\ \mathbf{z} \left(1\right) & = \left( \mathbf{z} \left( \mathbf{s}_1, 1 \right), \ldots, \mathbf{z} \left( \mathbf{s}_n, 1 \right) \right)'. \end{align*} \]


  • Reconstruction:
    • Use estimated \(\boldsymbol{\beta}\)s and fossil pollen \(\mathbf{y} \left( t \right)\) to predict unobserved \(\mathbf{z}\left( t \right)\).


Dynamic Model

  • For \(\mathbf{z} \left(t \right) = \left( \mathbf{z} \left(\mathbf{s}_1, t \right)', \ldots, \mathbf{z} \left(\mathbf{s}_n, t \right)' \right)\), we assume:

\[ \begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & = \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \boldsymbol{\eta} \left(t \right). \end{align*} \]


  • \(\mathbf{M}(t) = \rho \mathbf{I}_n\) is a propagator matrix.

  • \(\mathbf{X} \left(t \right) \boldsymbol{\gamma}\) are the fixed effects from covariates like latitude, elevation, etc.

  • \(\boldsymbol{\eta} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}\left( \boldsymbol{\theta} \right) \right)\).

  • \(\mathbf{R} \left( \boldsymbol{\theta} \right)\) is a Mátern spatial covariance matrix with parameters \(\boldsymbol{\theta}\).

Elevation

Scaling the process for big data

  • Define a set of spatial knot locations \(\mathbf{s}^{\star} = \left\{ \mathbf{s}_1^{\star}, \ldots, \mathbf{s}_m^{\star} \right\}\).

\[ \boldsymbol{\eta}^{\star} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right) \right). \]

  • \(\mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)\) is the spatial covariance defined at the knot locations \(\mathbf{s}^{\star}\).

Banerjee S, Gelfand AE, Finley AO, Sang H (2008). “Gaussian predictive process models for large spatial data sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.

Predictive Process

  • Linear interpolator from observed locations \(\mathbf{s}\) to knot locations \(\mathbf{s}_j^{\star}\) is \[ \mathbf{r} \left(\mathbf{s}, \mathbf{s}_j^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1}, \] where \(\mathbf{r} \left(\mathbf{s}, \mathbf{s}_j^{\star} \right) = \operatorname{Cov} \left(\mathbf{s}, \mathbf{s}_j^{\star} \right)\).

  • \(\boldsymbol{\eta} \left( t \right) \approx \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \boldsymbol{\eta}^{\star} \left( t \right)\).


  • The dynamic climate process is approximated by

\[ \begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & = \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \boldsymbol{\eta}^{\star} \left(t \right). \end{align*} \]

Implementation

gitHub package


  • Includes options for multiple models including:
    • mixture models.
    • different likelihoods and link functions.
    • correlations in functional response.


  • Leverages code in C++ using Rcpp package for computation speed.

Computational Details

  1. Sequential fitting

    1. Fit the calibration model.

    2. Use posterior distribution from stage (1) to generate predictions of climate independent in space and time.

    3. Smooth the posterior distribution from stage (2) using dynamic linear model.

      • Use posterior mean estimates which does not fully quantify model uncertainty.
  2. Fully Bayesian joint estimation.


Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • High-dimensional spatio-temporal process.
    • Inefficient to sample with block Metropolis.
    • Poor mixing of MCMC chains.


  • Non-linear transformation in the data model.
    • Difficult to use Kalman Filtering.


  • Particle Filtering Methods.
    • Difficult to implement, suffer from degeneracy.
    • Posterior can be multi-modal.

Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • Elliptical Slice Sampling.
    • Assumes a Gaussian prior.
    • Requires no tuning.
    • Efficiently samples in high dimensions.
    • Easily explores multiple modes.

  • Adaptive block Metropolis within Gibbs and Elliptical Slice Sampling algorithms.

  • Highly multi-modal posterior is efficiently explored within the sampler.

Murray I, Adams RP, MacKay DJC (2010). “Elliptical slice sampling.” In AISTATS, volume 13, 541-548.

Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • For Gaussian process expansion of \(\mathbf{z} \left( \mathbf{s}, t \right)\):
    • The latent climate states are inputs into the covariance function.
    • Covariance input locations (and distance) are random.
    • Unique computational challenge.
      • Total computational cost is prohibitive \(O(d \frac{(Tn)^3}{3})\).
  • Proposed solution:
    • Predictive process representation.
    • Reduced computation cost \(O(d T \frac{m^3}{3})\).

Tipton JR, Hooten MB, Nolan C, Booth RK, McLachlan J (2019). “Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction.” The Annals of Applied Statistics, 13(4), 2363-2388.

Reconstruction

Reconstruction over time

Linear Pólya-gamma model

Data model

\[\begin{align*} \mathbf{y}(\mathbf{s}, t) & \sim \operatorname{Multinomial}(M(\mathbf{s}, t), \pi_{SB}(\boldsymbol{\eta}(\mathbf{s}, t))). \end{align*}\]


  • \(\mathbf{y}(\mathbf{s}, t)\) is a \(J\) dimensional vector of counts at site \(\mathbf{s} \in \mathcal{D}\) and time \(t \in \{1, \ldots, n_t\}\).

  • Link the underlying climate states to the probability of observing species \(j\) through the latent variable \(\eta_j(\mathbf{s}, t)\).

Data model

  • \(\pi_{SB}(\boldsymbol{\eta}(\mathbf{s}, t))\) is a stick breaking transformation.


    • The map \(\pi_{SB}: \mathcal{R}^{J-1} \rightarrow \Delta^{J}\) transforms the \(J-1\) dimensional vector \(\boldsymbol{\eta}(\mathbf{s}, t)\) to the \(J\) dimensional unit simplex \(\mathcal{\Delta}^{J}\).

  • Other maps to the unit simplex could be used (i.e., multi-logit), but the stick-breaking map reduces computational cost.

Data model (ignoring spatio-temporal indexing)

\[\begin{align*} [\mathbf{y} | M, \pi_{SB}(\boldsymbol{\eta})] & =\operatorname{Multinomial}(\mathbf{y} | M, \pi_{SB}(\boldsymbol{\eta})) \\ & = \prod_{j=1}^{J-1} \operatorname{binomial}(y_j | M_j, \tilde{\pi}_j) \\ & = \prod_{j=}^{J-1} {M_j \choose y_j} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} }. \end{align*}\]

  • \(M_j = M - \sum_{k < j} M_k\).

  • Define the partial probabilities \(\tilde{\pi}_j = \pi_{SB}(\boldsymbol{\eta})_j\) using the stick-breaking representation.

Data model (ignoring spatio-temporal indexing)

\[\begin{align*} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} } & = 2^{-M_j} e^{\kappa(y_j) \eta_j} \int_0^\infty e^{- \omega_j \eta_j^2 / 2 } [\omega_j | M_j, 0] \,d \omega. \end{align*}\]


  • The integral identity is proportional to the product representation of the multinomial distribution.

  • The density \([\omega_j | M_j, 0]\) is a Pólya-gamma distribution.

Polson NG, Scott JG, Windle J (2013). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Journal of the American statistical Association.

Linderman S, Johnson M, Adams RP (2015). “Dependent multinomial models made easy: Stick-breaking with the Pólya-Gamma augmentation.” In Advances in Neural Information Processing Systems, 3456-3464.

Data model (ignoring spatio-temporal indexing)

  • With a prior \([\eta_j]\), joint density \([y_j, \eta_j]\) is

\[\begin{align*} [y_j, \eta_j] & = [\eta_j] {M_j \choose y_j} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} }\\ & = \int_0^\infty [\eta_j] {M_j \choose y_j} 2^{-M_j} e^{\kappa(y_j) \eta_j} e^{- \omega_j \eta_j^2 / 2 } [\omega_j | M_j, 0] \,d \omega, \end{align*}\]


where the integral defines a joint density over \([\eta_j, y_j, \omega_j]\).

Data model

  • Using this integral representation, we have

    \[\begin{align*} \omega_j | \eta_j, y_j & \sim \operatorname{PG( M_j, \eta_j)}, \end{align*}\]

    which can be sampled using the exponential tilting property of the Pólya-gamma distribution.


  • If \([\eta_j]\) is Gaussian, then \([\eta_j | \omega_j, y_j]\) is also Gaussian which enables conjugate sampling.

Data model

  • There is a cost:


    • Requires sampling the \(\omega_j\)s and these are computationally expensive.

    • However, these are also embarrassingly parallel.

    • Efficiently sampled using openMP parallelization.

  • For all examples tested so far with reduced-rank spatial processes, sampling Pólya-gamma random variables is the limiting computational cost.

Process model – functional relationship

\[\begin{align*} \eta_j(\mathbf{s}, t) = \color{blue}{\beta_{0j}} + \color{blue}{\beta_{1j}} \left( \mathbf{x}'(\mathbf{s}, t) \boldsymbol{\gamma} + \mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t) \right) + \color{blue}{\varepsilon(\mathbf{s}, t)}. \end{align*}\]


  • \(\color{blue}{\beta_{0j}}\) and \(\color{blue}{\beta_{1j}}\) are regression coefficients with respect to the climate state \(\mathbf{Z}(\mathbf{s}, t) = \left( \mathbf{x}'(\mathbf{s}, t) \boldsymbol{\gamma} + \mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t) \right)\).

  • \(\color{blue}{\varepsilon(\mathbf{s}, t)} \stackrel{iid}{\sim} \operatorname{N}(0, \sigma^2_j)\) models overdispersion relative to the linear regression response.

Process model – climate process

\[\begin{align*} \eta_j(\mathbf{s}, t) = \beta_{0j} + \beta_{1j} \left( \color{red}{\mathbf{x}'(\mathbf{s}, t) \boldsymbol{\gamma}} + \color{purple}{\mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t)} \right) + \varepsilon(\mathbf{s}, t). \end{align*}\]

\(\color{red}{\mathbf{X}(t) = \begin{pmatrix} \mathbf{x}'(\mathbf{s}_1, t) \\ \vdots \\ \mathbf{x}'(\mathbf{s}_n, t) \end{pmatrix}}\) are fixed covariates (elevation, latitude, etc.).

  • We assume \(\color{red}{\mathbf{X}(t) \equiv \mathbf{X}}\) for all \(t\) although temporally varying covariates are possible (volcanic forcings, Milankovitch cycles, etc.).

  • \(\color{purple}{\mathbf{W} = \begin{pmatrix} \mathbf{w}'(\mathbf{s}_1) \\ \vdots \\ \mathbf{w}'(\mathbf{s}_n) \end{pmatrix}}\) are spatial basis functions with temporal random effects \(\color{purple}{\boldsymbol{\alpha}(t)}\).

  • \(\mathbf{Z}_0 \sim \operatorname{N} (\color{red}{\mathbf{X}'(1) \boldsymbol{\gamma}} + \color{purple}{\mathbf{W} \boldsymbol{\alpha}(1)}, \sigma^2_0 \mathbf{I})\) is the observed modern climate state.

Process model – \(\color{purple}{\mbox{random effects}}\)

  • Challenge: scaling to 1000s of sites and 15,000 years (\(\approx 60\) time increments).

  • Sparse, multiresolution representation with a dynamic linear model.

\[\begin{align*} \color{purple}{\mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t)} = \color{purple}{\sum_{m=1}^M \mathbf{w}_m'(\mathbf{s}) \boldsymbol{\alpha}_m(t)}. \end{align*}\]


  • \(\color{purple}{\mathbf{w}_m'(\mathbf{s})}\) is a Wendland basis over resolution \(m\).

Process model – \(\color{purple}{\mbox{random effects}}\)



Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, Sain S (2015). “A multiresolution Gaussian process model for the analysis of large spatial datasets.” Journal of Computational and Graphical Statistics, 24(2), 579-599.

Process model – \(\color{purple}{\mbox{random effects}}\)

\[\begin{align*} \color{purple}{\boldsymbol{\alpha}_m(t)} & \sim \operatorname{N} \left( \mathbf{A}_m \color{purple}{\boldsymbol{\alpha}_m(t-1)}, \tau^2 \mathbf{Q}_m^{-1} \right). \end{align*}\]


  • \(\mathbf{A}_m\) is the propagator matrix (assume \(\mathbf{A}_m \equiv \rho \mathbf{I}\) for all \(m\)).

  • \(\mathbf{Q}_m\) is the precision matrix constructed from either a CAR or SAR process over the \(m\)th resolution.

Thanks for your attention

References

  • Banerjee S, Gelfand AE, Finley AO, Sang H (2008). “Gaussian predictive process models for large spatial data sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.
  • Murray I, Adams RP, MacKay DJC (2010). “Elliptical slice sampling.” In AISTATS, volume 13, 541-548.
  • Polson NG, Scott JG, Windle J (2013). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Journal of the American statistical Association.
  • Maclaurin D, Adams RP (2014). “Firefly Monte Carlo: Exact MCMC with subsets of data.” arXiv preprint arXiv:1403.5693.
  • Holmström L, Ilvonen L, Seppä H, Veski S (2015). “A Bayesian spatiotemporal model for reconstructing climate from multiple pollen records.” The Annals of Applied Statistics, 9(3), 1194-1225.
  • Linderman S, Johnson M, Adams RP (2015). “Dependent multinomial models made easy: Stick-breaking with the Pólya-Gamma augmentation.” In Advances in Neural Information Processing Systems, 3456-3464.
  • Nychka D, Bandyopadhyay S, Hammerling D, Lindgren F, Sain S (2015). “A multiresolution Gaussian process model for the analysis of large spatial datasets.” Journal of Computational and Graphical Statistics, 24(2), 579-599.
  • Hefley TJ, Broms KM, Brost BM, Buderman FE, Kay SL, Scharf HR, Tipton JR, Williams PJ, Hooten MB (2017). “The basis function approach for modeling autocorrelation in ecological data.” Ecology, 98(3), 632-646.
  • Hooten MB, Johnson DS, Brost BM (2019). “Making Recursive Bayesian Inference Accessible.” The American Statistician, 1-10.
  • Nolan C, Tipton J, Booth RK, Hooten MB, Jackson ST (2019). “Comparing and improving methods for reconstructing peatland water-table depth from testate amoebae.” The Holocene, 29(8), 1350-1361.
  • Tipton JR, Hooten MB, Nolan C, Booth RK, McLachlan J (2019). “Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction.” The Annals of Applied Statistics, 13(4), 2363-2388.